AWS Databases & Analytics
Zhengliang Wang edited at Sat Jun 29 2024
Cloud

Shared Responsibility on AWS

  • Quick provisioning, high availability, vertical and horizontal scaling
  • automated backup & restore, upgrade
  • monitor and alerting

Relational

RDS

  • automated provisioning, OS patching
  • continuous backup and restore to specific timestamp
  • monitoring dashboard
  • read replica
  • multi AZ for disaster recovery
  • maintainance windows for upgrades
  • scaling capability(vertical and horizontal)
  • storage backed by EBS

Amazon Aurora

  • close sourced by AWS
  • PostgreSQL and MySQL are supported as Aurora DB
  • Aurora is AWS cloud optimized, and claims 5x performance improvement over MySQL on RDS, 3x on PostgreSQL on RDS
  • Aurora cost more than RDS (20%+) but more efficient
  • Not in free tier

Amazon Aurora Serverless

  • Automated database instantiation and auto-scaling based on actual usage
  • PostgreSQL and MySQL are both supported as Aurora Serverless DB
  • No capacity planning needed
  • least management overhead
  • pay per second, can be more cost-effective
  • use case: good for infrequent, intermittent workload

Deployment: Read replicas, Multi-AZ

  • Read replicas, Write main RDS

ElastiCache

  • Redis or Memcached
  • in memory database

DynamoDB

  • Fully managed, highly available with replication across 3AZ
  • NoSQL database
  • scales to massive workloads, distributed "serverless" db.
  • millions of requests per second ,trillions of row, 100s of TB of storage
  • fast and consistent in performance
  • single-digit millisecond latency - low latency retrieval
  • integrated with IAM
  • low cost and auto scaling capacity
  • key/value based db with PrrimaryKey(partitionKey, sortKey)
  • Global Tables
    • Active-Active replication (read/write to any AZ)

DynamoDB Accelerator - DAX

  • fully managed in-memory cache for DynamoDB
  • 10x performance improvement
  • secure, highly scalable & highly available
  • DAX only used for and is integrated with DynamoDB

Redshift

  • PostgreSQL-based
  • OLAP - online analytical processing (analytics and data warehouse)
  • load data once every hour; not every second
  • 10x better performance than other data warehouses, scale to PBs of Data
  • Columnar storage of data instead of row based
  • massively parallel query execution, highly available
  • pay as you go based on the instance provisioned
  • has a SQL interface to perform queries
  • BI tools such as AWS Quicksight or Tableau integrate with it

Redshift Serverless

  • use case: reporting, dashboarding applications, real-time analytics
  • pay only for what you use
  • run analytics workloads without managing data warehouse infrastructure

Elastic MapReduce(EMR)

  • help creating Hadoop clusters(Big Data) to analyze and process vast amount of data
  • Support Apache Spark, HBase, Presto, Flink
  • Auto-scaling
  • use cases: data processing, machine learning, web indexing, big data

Amazon Athena

  • Serverless query service to perform analytics against S3 objects
  • uses standard SQL language to query files
  • Support CSV, JSON, ORC, Avro, and Parquet (built on Presto)
  • Pricing: $5 per TB of data scanned
  • use compressed or columnar data for cost savings
  • use case: BI analytics, reporting, analyze & query VPC Flow Logs, ELB logs

NoSQL

DocumentDB

  • same for MongoDB
  • similar concepts as Aurora
  • storage automatically grows in increments of 10GB

Amazon Neptune

  • Graph Database
  • highly available across 3 AZ, up to 15 read replicas
  • use case: graphs, fraud detection, recommendation engines, social networking

Amazon Timestream

  • serverless time series database
  • store and analyze trillions of events
  • 100sX faster and 1/10 cost of RDS

Amazon QLDB(quantum ledger database)

  • ledge is a book recording financial transaction
  • immutable system: no entry can be removed or modified, cryptographically verifiable
  • 2-3x better performance than common ledger blockchain framework, manipulate data using SQL
  • no decentralization component in comparison to Amazon Managed Blockchain.

Amazon Managed Blockchain

  • join public blockchain network
  • create own scalable private network
  • compatible with hyperedger fabric or Ethereum

AWS Glue:

  • manage extract, transform, and load (ETL) service
  • useful to prepare and transform data for analytics
  • fully serverless service

DMS - Database Migration Service

  • quick secure migrates dbs to AWS, resilient, self- healing
  • supports
    • homogeneous migrations: Oracle to Oracle
    • heterogeneous migrations: Microsoft SQL to Aurora

Summary

  • Relational databases -OLTP: RDS & Aurora
  • Difference between Multi-AZ Read Replicas, Multi-Region
  • In-memory Database: ElastiCache
  • K/V databases: DynamoDB( for MongoDB) & DAX
  • Warehouse - OLAP: Redshift(SQL)
  • Hadoop Cluster: EMR
  • Athena: query data on S3 with SQL capability
  • QuickSight: dashboard on data (serverless)
  • DocumentDB: Aurora for MongoDB
  • Amazon QLDB: Financial Transaction Ledger
  • Amazon Managed Blockchian: hyperledger fabric & ethereum blockchains
  • Glue: ELT and data catalog service
  • Database Migration: DMS